Midterm Project

Author

Yang Kang Chua, Tong Shen

Published

March 5, 2023

Data Cleaning

Code
import numpy as np
import pandas as pd

pd.set_option('display.max_columns', None)

nyc311 = pd.read_csv("nyc311_011523-012123_by022023.csv")

nyc311.head()
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Description Resolution Action Updated Date Community Board BBL Borough X Coordinate (State Plane) Y Coordinate (State Plane) Open Data Channel Type Park Facility Name Park Borough Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Latitude Longitude Location
0 56532399 01/15/2023 12:00:00 AM 01/17/2023 12:00:01 AM DOHMH Department of Health and Mental Hygiene Food Poisoning 1 or 2 Restaurant/Bar/Deli/Bakery 11101.0 28-07 JACKSON AVENUE JACKSON AVENUE ORCHARD STREET QUEENS PLAZA SOUTH ORCHARD STREET QUEENS PLAZA SOUTH ADDRESS LONG ISLAND CITY JACKSON AVENUE NaN Closed NaN The Department of Health and Mental Hygiene ha... 01/23/2023 03:10:38 PM 02 QUEENS 4.004200e+09 QUEENS 1001216.0 211957.0 ONLINE Unspecified QUEENS NaN NaN NaN NaN NaN NaN NaN 40.748433 -73.938768 (40.74843329250923, -73.93876844606294)
1 56533302 01/15/2023 12:00:00 AM 01/23/2023 12:00:01 AM DOHMH Department of Health and Mental Hygiene Food Poisoning 1 or 2 Other (Explain Below) 11101.0 48-18 NORTHERN BOULEVARD NORTHERN BOULEVARD 34 AVENUE 49 STREET 34 AVENUE 49 STREET ADDRESS LONG ISLAND CITY NORTHERN BOULEVARD NaN Closed NaN The Department of Health and Mental Hygiene re... 01/23/2023 02:53:36 PM 01 QUEENS 4.001200e+09 QUEENS 1008009.0 213852.0 ONLINE Unspecified QUEENS NaN NaN NaN NaN NaN NaN NaN 40.753619 -73.914245 (40.75361894380833, -73.91424532268624)
2 56533451 01/15/2023 12:00:00 AM 01/17/2023 09:34:08 AM DOHMH Department of Health and Mental Hygiene Food Poisoning 1 or 2 Food Cart Vendor 10305.0 1941 HYLAN BOULEVARD HYLAN BOULEVARD SEAVER AVENUE STOBE AVENUE SEAVER AVENUE STOBE AVENUE ADDRESS STATEN ISLAND HYLAN BOULEVARD NaN Closed NaN The Health Department’s Office of Environmen... 01/17/2023 09:34:02 AM 02 STATEN ISLAND NaN STATEN ISLAND 957246.0 151409.0 ONLINE Unspecified STATEN ISLAND NaN NaN NaN NaN NaN NaN NaN 40.582218 -74.097217 (40.58221767593629, -74.09721689751461)
3 56536021 01/15/2023 12:00:00 AM 01/17/2023 09:48:19 AM DOHMH Department of Health and Mental Hygiene Food Poisoning 1 or 2 Restaurant/Bar/Deli/Bakery 11103.0 24-11 STEINWAY STREET STEINWAY STREET ASTORIA BOULEVARD SOUTH 25 AVENUE ASTORIA BOULEVARD SOUTH 25 AVENUE ADDRESS ASTORIA STEINWAY STREET NaN Closed NaN The Department of Health and Mental Hygiene ha... 01/17/2023 09:48:19 AM 01 QUEENS 4.006860e+09 QUEENS 1008882.0 219338.0 ONLINE Unspecified QUEENS NaN NaN NaN NaN NaN NaN NaN 40.768674 -73.911074 (40.76867420855789, -73.91107425623979)
4 56538004 01/15/2023 12:00:00 AM 01/23/2023 03:08:02 PM DOHMH Department of Health and Mental Hygiene Food Poisoning 3 or More Other (Explain Below) 10036.0 700 8 AVENUE 8 AVENUE WEST 44 STREET WEST 45 STREET WEST 44 STREET WEST 45 STREET ADDRESS NEW YORK 8 AVENUE NaN Closed NaN NaN 01/23/2023 03:08:02 PM 05 MANHATTAN 1.010168e+09 MANHATTAN 987370.0 215678.0 PHONE Unspecified MANHATTAN NaN NaN NaN NaN NaN NaN NaN 40.758662 -73.988738 (40.75866221848322, -73.98873795996684)

Get a statistical summary

Code
nyc311.describe()
Unique Key Incident Zip BBL X Coordinate (State Plane) Y Coordinate (State Plane) Latitude Longitude
count 5.446900e+04 53720.000000 4.824400e+04 5.349500e+04 53526.000000 53491.000000 53491.000000
mean 5.656144e+07 10807.120514 2.695698e+09 1.005072e+06 207697.450342 40.736701 -73.924827
std 1.854674e+04 534.867544 1.148508e+09 2.094108e+04 32076.816492 0.088046 0.075525
min 5.652417e+07 10000.000000 0.000000e+00 9.140500e+05 121152.000000 40.498949 -74.252452
25% 5.654532e+07 10451.000000 2.027820e+09 9.934695e+05 183297.000000 40.669735 -73.966757
50% 5.656131e+07 11201.000000 3.016035e+09 1.004174e+06 205608.500000 40.730943 -73.928097
75% 5.657678e+07 11232.000000 4.000588e+09 1.017478e+06 237620.000000 40.818878 -73.880081
max 5.663566e+07 12345.000000 5.200430e+09 1.067177e+06 271730.000000 40.912468 -73.700743

Looks good! Lets take a look at the number of missing values.

Code
nyc311.isna().sum()
Unique Key                            0
Created Date                          0
Closed Date                        5176
Agency                                0
Agency Name                           0
Complaint Type                        0
Descriptor                          888
Location Type                      7388
Incident Zip                        749
Incident Address                   2584
Street Name                        2584
Cross Street 1                    19476
Cross Street 2                    19476
Intersection Street 1             21779
Intersection Street 2             21739
Address Type                        412
City                               2375
Landmark                          25674
Facility Type                     54185
Status                                0
Due Date                          54365
Resolution Description             1751
Resolution Action Updated Date     1705
Community Board                       0
BBL                                6225
Borough                               0
X Coordinate (State Plane)          974
Y Coordinate (State Plane)          943
Open Data Channel Type                0
Park Facility Name                    9
Park Borough                          0
Vehicle Type                      54460
Taxi Company Borough              54439
Taxi Pick Up Location             53900
Bridge Highway Name               54126
Bridge Highway Direction          54290
Road Ramp                         54334
Bridge Highway Segment            54126
Latitude                            978
Longitude                           978
Location                            978
dtype: int64

Get the type of the data frame

Code
nyc311.info();
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54469 entries, 0 to 54468
Data columns (total 41 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Unique Key                      54469 non-null  int64  
 1   Created Date                    54469 non-null  object 
 2   Closed Date                     49293 non-null  object 
 3   Agency                          54469 non-null  object 
 4   Agency Name                     54469 non-null  object 
 5   Complaint Type                  54469 non-null  object 
 6   Descriptor                      53581 non-null  object 
 7   Location Type                   47081 non-null  object 
 8   Incident Zip                    53720 non-null  float64
 9   Incident Address                51885 non-null  object 
 10  Street Name                     51885 non-null  object 
 11  Cross Street 1                  34993 non-null  object 
 12  Cross Street 2                  34993 non-null  object 
 13  Intersection Street 1           32690 non-null  object 
 14  Intersection Street 2           32730 non-null  object 
 15  Address Type                    54057 non-null  object 
 16  City                            52094 non-null  object 
 17  Landmark                        28795 non-null  object 
 18  Facility Type                   284 non-null    object 
 19  Status                          54469 non-null  object 
 20  Due Date                        104 non-null    object 
 21  Resolution Description          52718 non-null  object 
 22  Resolution Action Updated Date  52764 non-null  object 
 23  Community Board                 54469 non-null  object 
 24  BBL                             48244 non-null  float64
 25  Borough                         54469 non-null  object 
 26  X Coordinate (State Plane)      53495 non-null  float64
 27  Y Coordinate (State Plane)      53526 non-null  float64
 28  Open Data Channel Type          54469 non-null  object 
 29  Park Facility Name              54460 non-null  object 
 30  Park Borough                    54469 non-null  object 
 31  Vehicle Type                    9 non-null      object 
 32  Taxi Company Borough            30 non-null     object 
 33  Taxi Pick Up Location           569 non-null    object 
 34  Bridge Highway Name             343 non-null    object 
 35  Bridge Highway Direction        179 non-null    object 
 36  Road Ramp                       135 non-null    object 
 37  Bridge Highway Segment          343 non-null    object 
 38  Latitude                        53491 non-null  float64
 39  Longitude                       53491 non-null  float64
 40  Location                        53491 non-null  object 
dtypes: float64(6), int64(1), object(34)
memory usage: 17.0+ MB

Check missing value percentage

Code
# Find missing value
missing_value = nyc311.isnull()

#Divide the full data length to get the percentage
missing_value = missing_value.sum()/len(nyc311)

#Convert into dataframe
missing_value = missing_value.to_frame()

#Name the column
missing_value.columns = ['Missing Percentage']

#Display
missing_value
Missing Percentage
Unique Key 0.000000
Created Date 0.000000
Closed Date 0.095027
Agency 0.000000
Agency Name 0.000000
Complaint Type 0.000000
Descriptor 0.016303
Location Type 0.135637
Incident Zip 0.013751
Incident Address 0.047440
Street Name 0.047440
Cross Street 1 0.357561
Cross Street 2 0.357561
Intersection Street 1 0.399842
Intersection Street 2 0.399108
Address Type 0.007564
City 0.043603
Landmark 0.471351
Facility Type 0.994786
Status 0.000000
Due Date 0.998091
Resolution Description 0.032147
Resolution Action Updated Date 0.031302
Community Board 0.000000
BBL 0.114285
Borough 0.000000
X Coordinate (State Plane) 0.017882
Y Coordinate (State Plane) 0.017313
Open Data Channel Type 0.000000
Park Facility Name 0.000165
Park Borough 0.000000
Vehicle Type 0.999835
Taxi Company Borough 0.999449
Taxi Pick Up Location 0.989554
Bridge Highway Name 0.993703
Bridge Highway Direction 0.996714
Road Ramp 0.997522
Bridge Highway Segment 0.993703
Latitude 0.017955
Longitude 0.017955
Location 0.017955

Convert the time to date time format.

Code
nyc311['Created Date'] = pd.to_datetime(nyc311['Created Date'], errors='coerce')
nyc311['Closed Date'] = pd.to_datetime(nyc311['Closed Date'], errors='coerce')

Let take a look on if Closed Date earlier than Created Date.

Code
# Create a filter to find if the Closed Date is earlier and Created Date

Filter_1 = nyc311[nyc311['Created Date'] > nyc311['Closed Date']]

Filter_1.head()
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Description Resolution Action Updated Date Community Board BBL Borough X Coordinate (State Plane) Y Coordinate (State Plane) Open Data Channel Type Park Facility Name Park Borough Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Latitude Longitude Location
15759 56545596 2023-01-17 09:13:00 2023-01-16 09:12:00 DOT Department of Transportation Street Light Condition Fixture/Luminaire Door Open NaN 10470.0 NaN NaN NaN NaN OSMAN PLACE EAST 241 STREET INTERSECTION BRONX NaN NaN Pending NaN Service Request status for this request is ava... 01/16/2023 09:12:00 AM 12 BRONX NaN BRONX 1026374.0 267996.0 UNKNOWN Unspecified BRONX NaN NaN NaN NaN NaN NaN NaN 40.902160 -73.847620 (40.902160080380305, -73.8476196608694)
16346 56548261 2023-01-17 10:14:00 2023-01-16 09:37:00 DOT Department of Transportation Street Light Condition Street Light Out NaN 10470.0 NaN NaN NaN NaN WHITE PLAINS ROAD EAST 241 STREET INTERSECTION BRONX NaN NaN Pending NaN Service Request status for this request is ava... 01/16/2023 09:37:00 AM 12 BRONX NaN BRONX 1025621.0 268476.0 UNKNOWN Unspecified BRONX NaN NaN NaN NaN NaN NaN NaN 40.903481 -73.850341 (40.903481086856935, -73.85034060870207)
16374 56546592 2023-01-17 10:16:00 2023-01-16 10:15:00 DOT Department of Transportation Street Light Condition Street Light Out NaN 10470.0 NaN NaN NaN NaN EAST 239 STREET MATILDA AVENUE INTERSECTION BRONX NaN NaN Pending NaN Service Request status for this request is ava... 01/16/2023 10:15:00 AM 12 BRONX NaN BRONX 1024368.0 267516.0 UNKNOWN Unspecified BRONX NaN NaN NaN NaN NaN NaN NaN 40.900852 -73.854879 (40.90085198617899, -73.85487907500377)
16392 56552546 2023-01-17 10:18:00 2023-01-16 10:17:00 DOT Department of Transportation Street Light Condition Fixture/Luminaire Out Of Position NaN 10459.0 NaN NaN NaN NaN SOUTHERN BOULEVARD JENNINGS STREET INTERSECTION BRONX NaN NaN Pending NaN Service Request status for this request is ava... 01/16/2023 10:17:00 AM 03 BRONX NaN BRONX 1014492.0 242563.0 UNKNOWN Unspecified BRONX NaN NaN NaN NaN NaN NaN NaN 40.832403 -73.890717 (40.83240302209445, -73.890716609599)
16544 56549764 2023-01-17 10:28:00 2023-01-16 10:27:00 DOT Department of Transportation Street Light Condition Fixture/Luminaire Hanging NaN 10457.0 NaN NaN NaN NaN EAST 180 STREET PROSPECT AVENUE INTERSECTION BRONX NaN NaN Pending NaN Service Request status for this request is ava... 01/16/2023 10:27:00 AM 06 BRONX NaN BRONX 1015598.0 247912.0 UNKNOWN Unspecified BRONX NaN NaN NaN NaN NaN NaN NaN 40.847081 -73.886695 (40.84708060167133, -73.88669493748064)

Few things we can do to clean the data:

  • Remove the rows in which Closed Date is earlier than Created Date.
  • Missing Value from Incident Zip can be found using Latitude and Longitude using uszipcode.
  • Latitude and Longitude can be found using the available Incident Address, Street Name, Cross Street 1, Cross Street 2, Intersection Street 1, Intersection Street 2 or as known as Address Type.
  • We can get X Coordinate (State Plane)and Y Coordinate (State Plane) using stateplane.
  • Update Borough from Incident Zip.

Begin by removing the rows where Closed Date is earlier than `Created Date``.

Code
nyc311 = nyc311[nyc311['Created Date'] < nyc311['Closed Date']]

Now work on getting Latitude and Longitude can be found using the available Incident Address, Street Name, Cross Street 1, Cross Street 2, Intersection Street 1, Intersection Street 2 or as known as Address Type.

Code
Filter_1 = nyc311[nyc311['Latitude'].isnull()]

pd.crosstab(Filter_1["Incident Address"].isnull(), Filter_1["Latitude"].isnull())
pd.crosstab(Filter_1["Street Name"].isnull(), Filter_1["Latitude"].isnull())
Latitude True
Street Name
False 499
True 270

Begin with solving Address Type that is INTERSECTION.

Code
# Get the remaining data that is missing with Latitude
Filter_1 = nyc311[nyc311['Latitude'].isnull()]

# Get the data that contain INTERSECTION
Filter_1 = Filter_1[Filter_1['Address Type'] =='INTERSECTION']

from geopy.geocoders import Nominatim
from geopy.geocoders import GoogleV3
from geopy.extra.rate_limiter import RateLimiter

Intersection = Filter_1['Intersection Street 1'] + ' @ ' + Filter_1['Intersection Street 2'] + ' NY'

for index in Intersection.index:
    geocode = RateLimiter(GoogleV3(api_key='AIzaSyDhsJ5roz0w0ll0YegLJZn-niSBjm1ns5A').geocode, min_delay_seconds=1)
    location = geocode(Intersection[index], timeout = 10)
    if location is not None:
        Filter_1.loc[index,['Latitude',"Longitude"]] = [location.latitude,location.longitude]
        
# Update the data
for index in Filter_1.index:
    nyc311.loc[index,'Latitude']   = Filter_1['Latitude'][index]
    nyc311.loc[index,'Longitude'] = Filter_1['Longitude'][index]

Then work with address type BLOCKFACE.

Code
Filter_1 = nyc311[nyc311['Latitude'].isnull()]

Filter_1 = Filter_1[Filter_1['Address Type'] == 'BLOCKFACE']

Cross_Street = Filter_1['Cross Street 1'] + ' @ ' + Filter_1['Cross Street 2'] + ' NY '

for index in Cross_Street.index:
    if not Filter_1['Cross Street 1'][index] == Filter_1['Cross Street 2'][index]:
        if Filter_1['Cross Street 2'][index] is not np.nan:
            if Filter_1['Cross Street 2'][index] != 'SEE COMMENTS FOR CROSS ST':
                geocode = RateLimiter(GoogleV3(api_key='AIzaSyDhsJ5roz0w0ll0YegLJZn-niSBjm1ns5A').geocode, min_delay_seconds=1)
                location = geocode(Cross_Street[index], timeout = 10)
                if location is not None:
                    Filter_1.loc[index,['Latitude',"Longitude"]] = [location.latitude,location.longitude]

# Use the Street Name and Cross Street 1 to get the latitude and longitude
Filter_2 = Filter_1[Filter_1['Latitude'].isnull()]

Street_Cross = Filter_2['Street Name'] + ' @ ' + Filter_2['Cross Street 1'] + ' NY '

for index in Street_Cross.index:
    geocode = RateLimiter(GoogleV3(api_key='AIzaSyDhsJ5roz0w0ll0YegLJZn-niSBjm1ns5A').geocode, min_delay_seconds=1)
    location = geocode(Street_Cross[index], timeout = 10)
    if location is not None:
        Filter_1.loc[index,['Latitude',"Longitude"]] = [location.latitude,location.longitude]

for index in Filter_1.index:
    nyc311.loc[index,'Latitude']   = Filter_1['Latitude'][index]
    nyc311.loc[index,'Longitude'] = Filter_1['Longitude'][index]

Work on address type ADDRESS.

Code
Filter_1 = nyc311[nyc311['Latitude'].isnull()]
Filter_1 = Filter_1[Filter_1['Address Type'] == 'ADDRESS']

Address = Filter_1['Incident Address'] + ' NY '

for index in Address.index:
    geocode = RateLimiter(GoogleV3(api_key='AIzaSyDhsJ5roz0w0ll0YegLJZn-niSBjm1ns5A').geocode, min_delay_seconds=1)
    location = geocode(Address[index], timeout = 10)
    if location is not None:
        Filter_1.loc[index,['Latitude',"Longitude"]] = [location.latitude,location.longitude]

# Update
for index in Filter_1.index:
    nyc311.loc[index,'Latitude']   = Filter_1['Latitude'][index]
    nyc311.loc[index,'Longitude'] = Filter_1['Longitude'][index]

Solve the remaining address.

Code
# Solve the remaining address

Filter_1 = nyc311[nyc311['Latitude'].isnull()]

Filter_1 = Filter_1[Filter_1['Incident Address'].notnull()]

Address = Filter_1['Incident Address'] + ' NY '

for index in Address.index:
    geocode = RateLimiter(GoogleV3(api_key='AIzaSyDhsJ5roz0w0ll0YegLJZn-niSBjm1ns5A').geocode, min_delay_seconds=1)
    location = geocode(Address[index], timeout = 10)
    if location is not None:
        Filter_1.loc[index,['Latitude',"Longitude"]] = [location.latitude,location.longitude]

for index in Filter_1.index:
    nyc311.loc[index,'Latitude']   = Filter_1['Latitude'][index]
    nyc311.loc[index,'Longitude'] = Filter_1['Longitude'][index]

These are the remaining latitude and longitude that is missing

Code
# Show the remaining data that has missing Latitude and Longitude

Filter_1 = nyc311[nyc311['Latitude'].isnull()]

Filter_1
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Description Resolution Action Updated Date Community Board BBL Borough X Coordinate (State Plane) Y Coordinate (State Plane) Open Data Channel Type Park Facility Name Park Borough Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Latitude Longitude Location
1067 56533187 2023-01-15 05:39:14 2023-01-17 13:14:50 DOT Department of Transportation Ferry Complaint Delays Ferry NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation contacted the... 01/17/2023 01:14:56 PM 0 Unspecified NaN Unspecified NaN NaN ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4128 56537038 2023-01-15 15:32:05 2023-01-18 09:24:29 TLC Taxi and Limousine Commission Lost Property Bag/Wallet NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Taxi and Limousine Commission (TLC) has de... 01/18/2023 09:24:24 AM 0 Unspecified NaN Unspecified NaN NaN MOBILE Unspecified Unspecified NaN NaN 1000 10th Ave, New York, NY 10019, USA NaN NaN NaN NaN NaN NaN NaN
4364 56535371 2023-01-15 16:12:00 2023-01-18 07:00:41 DOT Department of Transportation Highway Condition Pothole - Highway Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation has completed... 01/18/2023 07:00:46 AM Unspecified BROOKLYN NaN BROOKLYN NaN NaN PHONE Unspecified BROOKLYN NaN NaN NaN Belt Pkwy East/Queens Bound Roadway Coney Island Ave (Exit 8) - Knapp St Sheepshea... NaN NaN NaN
4837 56536597 2023-01-15 17:30:03 2023-01-20 15:41:43 DOT Department of Transportation Highway Sign - Dangling Other/Unknown Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation has completed... 01/20/2023 03:41:48 PM Unspecified BROOKLYN NaN BROOKLYN NaN NaN ONLINE Unspecified BROOKLYN NaN NaN NaN Belt Pkwy West/Staten Island Bound Roadway Pennsylvania Ave (Exit 14) - Rockaway Pkwy (Ex... NaN NaN NaN
6096 56532447 2023-01-15 21:40:23 2023-02-07 13:49:37 DOT Department of Transportation Highway Condition Loose Plate Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The condition was inspected and determined not... 02/07/2023 01:49:44 PM Unspecified BRONX NaN BRONX NaN NaN PHONE Unspecified BRONX NaN NaN NaN Throgs Neck Expwy South/Queens Bound Roadway Bruckner Expwy (I95 N) - Randall Ave NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49656 56585163 2023-01-21 10:10:44 2023-01-31 07:01:47 DOT Department of Transportation Highway Condition Pothole - Highway Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation has completed... 01/31/2023 07:01:53 AM Unspecified BROOKLYN NaN BROOKLYN NaN NaN PHONE Unspecified BROOKLYN NaN NaN NaN Belt Pkwy West/Staten Island Bound Roadway Rockaway Pkwy (Exit 13) - Flatbush Ave (Exit 11N) NaN NaN NaN
50143 56586047 2023-01-21 11:26:56 2023-01-24 08:33:48 DOT Department of Transportation Highway Condition Pothole - Highway Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation has completed... 01/24/2023 08:33:52 AM Unspecified QUEENS NaN QUEENS NaN NaN PHONE Unspecified QUEENS NaN NaN NaN Long Island Expwy West/Manhattan Bound Roadway Junction Blvd (Exit 20) - Queens Blvd (NY 25) ... NaN NaN NaN
50169 56587808 2023-01-21 11:31:51 2023-01-24 08:32:50 DOT Department of Transportation Highway Condition Pothole - Highway Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation has completed... 01/24/2023 08:32:53 AM Unspecified QUEENS NaN QUEENS NaN NaN PHONE Unspecified QUEENS NaN NaN NaN Long Island Expwy West/Manhattan Bound Roadway Junction Blvd (Exit 20) - Queens Blvd (NY 25) ... NaN NaN NaN
53940 56589030 2023-01-21 22:47:03 2023-01-22 00:04:27 NYPD New York City Police Department Traffic Drag Racing Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department reviewed your complaint ... 01/22/2023 12:04:34 AM Unspecified BROOKLYN NaN BROOKLYN NaN NaN ONLINE Unspecified BROOKLYN NaN NaN NaN Belt Pkwy West/Staten Island Bound Roadway Cross Bay Blvd (Exit 17S) - Erskine St (Exit 15) NaN NaN NaN
53985 56587843 2023-01-21 22:55:35 2023-01-24 08:32:23 DOT Department of Transportation Highway Condition Pothole - Highway Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation has completed... 01/24/2023 08:32:27 AM Unspecified QUEENS NaN QUEENS NaN NaN ONLINE Unspecified QUEENS NaN NaN NaN Long Island Expwy West/Manhattan Bound Roadway Junction Blvd (Exit 20) - Queens Blvd (NY 25) ... NaN NaN NaN

159 rows × 41 columns

Most of the latitude are due to most location is at hightway.

Now work on getting zipcode from available latitude and longitude.

Code
from uszipcode import SearchEngine

# Create a SearchEngine object
search = SearchEngine()

Filter_1 = nyc311[nyc311['Incident Zip'].isnull()]

Filter_1 = Filter_1[Filter_1['Latitude'].notnull()]

for index in Filter_1.index:
    z = search.by_coordinates(Filter_1['Latitude'][index],Filter_1['Longitude'][index], radius = 25)
    Filter_1.loc[index,'Incident Zip'] = z[0].zipcode

# Make an update
for index in Filter_1.index:
    nyc311.loc[index,'Incident Zip']   = Filter_1['Incident Zip'][index]
/home/myc/applications/miniconda3/lib/python3.10/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
  warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')

The remaining missing date still shown here, this means that the rest of the Incident Zip is filled!

Code
Filter_1 = nyc311[nyc311['Incident Zip'].isnull()]

Filter_1
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Description Resolution Action Updated Date Community Board BBL Borough X Coordinate (State Plane) Y Coordinate (State Plane) Open Data Channel Type Park Facility Name Park Borough Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Latitude Longitude Location
1067 56533187 2023-01-15 05:39:14 2023-01-17 13:14:50 DOT Department of Transportation Ferry Complaint Delays Ferry NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation contacted the... 01/17/2023 01:14:56 PM 0 Unspecified NaN Unspecified NaN NaN ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4128 56537038 2023-01-15 15:32:05 2023-01-18 09:24:29 TLC Taxi and Limousine Commission Lost Property Bag/Wallet NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Taxi and Limousine Commission (TLC) has de... 01/18/2023 09:24:24 AM 0 Unspecified NaN Unspecified NaN NaN MOBILE Unspecified Unspecified NaN NaN 1000 10th Ave, New York, NY 10019, USA NaN NaN NaN NaN NaN NaN NaN
4364 56535371 2023-01-15 16:12:00 2023-01-18 07:00:41 DOT Department of Transportation Highway Condition Pothole - Highway Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation has completed... 01/18/2023 07:00:46 AM Unspecified BROOKLYN NaN BROOKLYN NaN NaN PHONE Unspecified BROOKLYN NaN NaN NaN Belt Pkwy East/Queens Bound Roadway Coney Island Ave (Exit 8) - Knapp St Sheepshea... NaN NaN NaN
4837 56536597 2023-01-15 17:30:03 2023-01-20 15:41:43 DOT Department of Transportation Highway Sign - Dangling Other/Unknown Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation has completed... 01/20/2023 03:41:48 PM Unspecified BROOKLYN NaN BROOKLYN NaN NaN ONLINE Unspecified BROOKLYN NaN NaN NaN Belt Pkwy West/Staten Island Bound Roadway Pennsylvania Ave (Exit 14) - Rockaway Pkwy (Ex... NaN NaN NaN
6096 56532447 2023-01-15 21:40:23 2023-02-07 13:49:37 DOT Department of Transportation Highway Condition Loose Plate Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The condition was inspected and determined not... 02/07/2023 01:49:44 PM Unspecified BRONX NaN BRONX NaN NaN PHONE Unspecified BRONX NaN NaN NaN Throgs Neck Expwy South/Queens Bound Roadway Bruckner Expwy (I95 N) - Randall Ave NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49656 56585163 2023-01-21 10:10:44 2023-01-31 07:01:47 DOT Department of Transportation Highway Condition Pothole - Highway Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation has completed... 01/31/2023 07:01:53 AM Unspecified BROOKLYN NaN BROOKLYN NaN NaN PHONE Unspecified BROOKLYN NaN NaN NaN Belt Pkwy West/Staten Island Bound Roadway Rockaway Pkwy (Exit 13) - Flatbush Ave (Exit 11N) NaN NaN NaN
50143 56586047 2023-01-21 11:26:56 2023-01-24 08:33:48 DOT Department of Transportation Highway Condition Pothole - Highway Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation has completed... 01/24/2023 08:33:52 AM Unspecified QUEENS NaN QUEENS NaN NaN PHONE Unspecified QUEENS NaN NaN NaN Long Island Expwy West/Manhattan Bound Roadway Junction Blvd (Exit 20) - Queens Blvd (NY 25) ... NaN NaN NaN
50169 56587808 2023-01-21 11:31:51 2023-01-24 08:32:50 DOT Department of Transportation Highway Condition Pothole - Highway Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation has completed... 01/24/2023 08:32:53 AM Unspecified QUEENS NaN QUEENS NaN NaN PHONE Unspecified QUEENS NaN NaN NaN Long Island Expwy West/Manhattan Bound Roadway Junction Blvd (Exit 20) - Queens Blvd (NY 25) ... NaN NaN NaN
53940 56589030 2023-01-21 22:47:03 2023-01-22 00:04:27 NYPD New York City Police Department Traffic Drag Racing Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department reviewed your complaint ... 01/22/2023 12:04:34 AM Unspecified BROOKLYN NaN BROOKLYN NaN NaN ONLINE Unspecified BROOKLYN NaN NaN NaN Belt Pkwy West/Staten Island Bound Roadway Cross Bay Blvd (Exit 17S) - Erskine St (Exit 15) NaN NaN NaN
53985 56587843 2023-01-21 22:55:35 2023-01-24 08:32:23 DOT Department of Transportation Highway Condition Pothole - Highway Highway NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Department of Transportation has completed... 01/24/2023 08:32:27 AM Unspecified QUEENS NaN QUEENS NaN NaN ONLINE Unspecified QUEENS NaN NaN NaN Long Island Expwy West/Manhattan Bound Roadway Junction Blvd (Exit 20) - Queens Blvd (NY 25) ... NaN NaN NaN

159 rows × 41 columns

Update Borough from Incident Zip.

Code
# Function that classified the borough
def nyczip2burough(zip):
    nzip = int(zip)
    if nzip >= 10001 and nzip <= 10282:
        return "MANHATTAN"
    elif nzip >= 10301 and nzip <= 10314:
        return "STATEN ISLAND"
    elif nzip >= 10451 and nzip <= 10475:
        return "BRONX"
    elif nzip >= 11004 and nzip <= 11109:
        return "QUEENS"
    elif nzip >= 11351 and nzip <= 11697:
        return "QUEENS"
    elif nzip >= 11201 and nzip <= 11256:
        return "BROOKLYN"
    else:
        return np.nan

Filter_1 = nyc311[nyc311['Borough'] == 'Unspecified']

Filter_1 = Filter_1[Filter_1['Incident Zip'].notnull()]

for index in Filter_1.index:
    burough = nyczip2burough(Filter_1.loc[index,['Incident Zip']])
    Filter_1.loc[index,'Borough'] = burough

#Update the data
for index in Filter_1.index:
  nyc311.loc[index,'Borough']   = Filter_1['Borough'][index]

Get X Coordinate (State Plane)and Y Coordinate (State Plane) using stateplane.

Code
import stateplane

Filter_1 = nyc311[nyc311['X Coordinate (State Plane)'].isnull()]

Filter_1 = Filter_1[Filter_1['Latitude'].notnull()]

for index in Filter_1.index:
    x, y = stateplane.from_lonlat(Filter_1['Longitude'][index],Filter_1['Latitude'][index])
    Filter_1.loc[index,'X Coordinate (State Plane)'] = x
    Filter_1.loc[index,'Y Coordinate (State Plane)'] = y

for index in Filter_1.index:
    nyc311.loc[index,'X Coordinate (State Plane)'] = Filter_1['X Coordinate (State Plane)'][index]
    nyc311.loc[index,'Y Coordinate (State Plane)'] = Filter_1['Y Coordinate (State Plane)'][index]

#Show the filled state plane 
Filter_1
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Description Resolution Action Updated Date Community Board BBL Borough X Coordinate (State Plane) Y Coordinate (State Plane) Open Data Channel Type Park Facility Name Park Borough Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Latitude Longitude Location
539 56530078 2023-01-15 01:50:16 2023-01-15 02:53:59 NYPD New York City Police Department Noise - Vehicle Car/Truck Music Street/Sidewalk 10468.0 CRESTON AVENUE CRESTON AVENUE EAST 184 STREET EAST 188 STREET EAST 184 STREET EAST 188 STREET BLOCKFACE NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/15/2023 02:54:07 AM Unspecified BRONX NaN BRONX 308780.727609 76670.654502 ONLINE Unspecified BRONX NaN NaN NaN NaN NaN NaN NaN 40.857056 -73.895860 NaN
1197 56537742 2023-01-15 06:57:00 2023-01-15 09:10:00 DOT Department of Transportation Traffic Signal Condition Controller NaN 11429 NaN NaN NaN NaN HEMPSTEAD AVE CROSS ISL PKWY N/B ET INTERSECTION NaN NaN NaN Closed NaN Service Request status for this request is ava... 01/15/2023 09:10:00 AM Unspecified QUEENS NaN QUEENS 322930.699511 60513.172365 UNKNOWN Unspecified QUEENS NaN NaN NaN NaN NaN NaN NaN 40.711285 -73.728634 NaN
1262 56537491 2023-01-15 07:21:21 2023-01-17 12:35:00 DOT Department of Transportation Street Condition Pothole NaN 10026.0 WEST 110 STREET WEST 110 STREET 7 AVENUE LENOX AVENUE NaN NaN BLOCKFACE MANHATTAN NaN NaN Closed NaN The Department of Transportation inspected thi... 01/17/2023 12:35:00 PM 64 MANHATTAN NaN MANHATTAN 303919.915080 70130.517504 UNKNOWN Unspecified MANHATTAN NaN NaN NaN NaN NaN NaN NaN 40.798201 -73.953550 NaN
1294 56535076 2023-01-15 07:34:12 2023-01-17 08:55:00 DOT Department of Transportation Street Condition Pothole NaN 10472.0 WATSON AVENUE WATSON AVENUE LUIS ANGEL SUPA TORRES WAY WHEELER AVENUE NaN NaN BLOCKFACE BRONX NaN NaN Closed NaN The Department of Transportation inspected thi... 01/17/2023 08:55:00 AM 09 BRONX NaN BRONX 307703.790428 71621.339526 UNKNOWN Unspecified BRONX NaN NaN NaN NaN NaN NaN NaN 40.811599 -73.908694 NaN
1345 56533345 2023-01-15 07:54:56 2023-01-17 13:40:00 DOT Department of Transportation Street Condition Pothole NaN 11421.0 WOODHAVEN BOULEVARD WOODHAVEN BOULEVARD 85 ROAD 86 AVENUE NaN NaN BLOCKFACE QUEENS NaN NaN Closed NaN The Department of Transportation inspected thi... 01/17/2023 01:40:00 PM 09 QUEENS NaN QUEENS 311939.070297 58708.106299 UNKNOWN Unspecified QUEENS NaN NaN NaN NaN NaN NaN NaN 40.695264 -73.858745 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
53690 56588873 2023-01-21 22:09:23 2023-01-23 12:40:00 DOT Department of Transportation Street Condition Pothole NaN 10310.0 HARVEST AVENUE HARVEST AVENUE JIMMY OHANLON WAY BEMENT AVENUE NaN NaN BLOCKFACE STATEN ISLAND NaN NaN Closed NaN The Department of Transportation inspected thi... 01/23/2023 12:40:00 PM 01 STATEN ISLAND NaN STATEN ISLAND 290549.794910 51681.701585 UNKNOWN Unspecified STATEN ISLAND NaN NaN NaN NaN NaN NaN NaN 40.632022 -74.111703 NaN
53699 56585027 2023-01-21 22:10:54 2023-01-21 22:55:15 NYPD New York City Police Department Illegal Parking Posted Parking Sign Violation Street/Sidewalk 10019.0 WEST 56 STREET WEST 56 STREET 11 AVENUE 12 AVENUE 11 AVENUE 12 AVENUE BLOCKFACE NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/21/2023 10:55:20 PM Unspecified MANHATTAN NaN MANHATTAN 191577.244392 212705.074696 PHONE Unspecified MANHATTAN NaN NaN NaN NaN NaN NaN NaN 40.748207 -74.007648 NaN
53819 56585350 2023-01-21 22:31:45 2023-01-23 11:05:00 DOT Department of Transportation Street Condition Pothole NaN 11422.0 143 AVENUE 143 AVENUE 241 STREET 243 STREET NaN NaN BLOCKFACE QUEENS NaN NaN Closed NaN The Department of Transportation inspected thi... 01/23/2023 11:05:00 AM 13 QUEENS NaN QUEENS 312603.129774 81836.115030 UNKNOWN Unspecified QUEENS NaN NaN NaN NaN NaN NaN NaN 40.903521 -73.850421 NaN
53933 56585351 2023-01-21 22:46:04 2023-01-23 08:54:00 DOT Department of Transportation Street Condition Pothole NaN 11374.0 YELLOWSTONE BOULEVARD YELLOWSTONE BOULEVARD 68 ROAD KESSEL STREET NaN NaN BLOCKFACE QUEENS NaN NaN Closed NaN The Department of Transportation inspected thi... 01/23/2023 08:54:00 AM 06 QUEENS NaN QUEENS 312748.028659 60713.155900 UNKNOWN Unspecified QUEENS NaN NaN NaN NaN NaN NaN NaN 40.713307 -73.849133 NaN
53964 56588862 2023-01-21 22:51:53 2023-01-23 09:25:00 DOT Department of Transportation Street Condition Pothole NaN 11691.0 CAFFREY AVENUE CAFFREY AVENUE BEACH 9 STREET READS LANE NaN NaN BLOCKFACE QUEENS NaN NaN Closed NaN The Department of Transportation inspected thi... 01/23/2023 09:25:00 AM 14 QUEENS NaN QUEENS 321546.437293 48380.221430 UNKNOWN Unspecified QUEENS NaN NaN NaN NaN NaN NaN NaN 40.602064 -73.745433 NaN

609 rows × 41 columns

Summary after cleaning the data.

  • There are more missing X Coordinate (State Plane) than Y Coordinate (State Plane).
  • There is duplicate when stating Cross Street 1, Cross Street 2, Intersection Street 1,and Intersection Street 2.
  • Agency from DOT seems commonly mixed up the Closed Date and Created Date.
  • Borough that is unspecified should count as missing.
  • Many Park Facility Name is unspecified should count as missing.

Explore duration distribution

Remove the request not from NYPD

Code
nyc311_NYPD = nyc311[nyc311['Agency'] == 'NYPD']

A = nyc311_NYPD[nyc311_NYPD['Closed Date'].isnull()]

A
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Description Resolution Action Updated Date Community Board BBL Borough X Coordinate (State Plane) Y Coordinate (State Plane) Open Data Channel Type Park Facility Name Park Borough Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Latitude Longitude Location

Only one data still in progress. We can remove that too. Add a new variable called Duration (hours)

Code
nyc311_NYPD = nyc311_NYPD[nyc311_NYPD['Closed Date'].notnull()]

nyc311_NYPD['Duration (hours)'] = nyc311_NYPD['Closed Date'] - nyc311_NYPD['Created Date']

col = nyc311_NYPD.pop("Duration (hours)")

s_hours = col.dt.total_seconds() / 3600.0

nyc311_NYPD.insert(3,s_hours.name ,s_hours)

nyc311_NYPD['Day type'] = nyc311_NYPD['Created Date'].dt.dayofweek

for index in nyc311_NYPD.index:
    if nyc311_NYPD['Day type'][index] < 5:
       nyc311_NYPD.loc[index,'Day type'] = 'Weekday'
    else:
       nyc311_NYPD.loc[index,'Day type'] = 'Weekend'
       
col = nyc311_NYPD.pop("Day type")

nyc311_NYPD.insert(4,col.name ,col)

nyc311_NYPD
Unique Key Created Date Closed Date Duration (hours) Day type Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Description Resolution Action Updated Date Community Board BBL Borough X Coordinate (State Plane) Y Coordinate (State Plane) Open Data Channel Type Park Facility Name Park Borough Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Latitude Longitude Location
5 56524984 2023-01-15 00:00:18 2023-01-15 02:37:30 2.620000 Weekend NYPD New York City Police Department Noise - Residential Banging/Pounding Residential Building/House 10467.0 2504 BRONX PARK EAST BRONX PARK EAST MACE AVENUE ALLERTON AVENUE MACE AVENUE ALLERTON AVENUE ADDRESS BRONX BRONX PARK EAST NaN Closed NaN The Police Department responded to the complai... 01/15/2023 02:37:36 AM 11 BRONX 2.044270e+09 BRONX 1020095.0 253941.0 ONLINE Unspecified BRONX NaN NaN NaN NaN NaN NaN NaN 40.863611 -73.870409 (40.8636112787934, -73.87040864262848)
6 56526790 2023-01-15 00:00:21 2023-01-15 01:54:24 1.900833 Weekend NYPD New York City Police Department Noise - Street/Sidewalk Loud Music/Party Street/Sidewalk 11418.0 113-22 BESSEMER STREET BESSEMER STREET 84 AVENUE 114 STREET 84 AVENUE 114 STREET ADDRESS RICHMOND HILL BESSEMER STREET NaN Closed NaN The Police Department responded to the complai... 01/15/2023 01:54:28 AM 09 QUEENS 4.092110e+09 QUEENS 1029284.0 195357.0 MOBILE Unspecified QUEENS NaN NaN NaN NaN NaN NaN NaN 40.702772 -73.837580 (40.70277200816068, -73.83758019372564)
7 56525034 2023-01-15 00:00:50 2023-01-15 01:09:56 1.151667 Weekend NYPD New York City Police Department Illegal Parking Posted Parking Sign Violation Street/Sidewalk 11217.0 50 NEVINS STREET NEVINS STREET SCHERMERHORN STREET STATE STREET SCHERMERHORN STREET STATE STREET ADDRESS BROOKLYN NEVINS STREET NaN Closed NaN The Police Department responded to the complai... 01/15/2023 01:10:00 AM 02 BROOKLYN 3.001728e+09 BROOKLYN 989264.0 189641.0 MOBILE Unspecified BROOKLYN NaN NaN NaN NaN NaN NaN NaN 40.687196 -73.981921 (40.68719608843765, -73.98192072171082)
8 56526234 2023-01-15 00:01:02 2023-01-15 05:47:41 5.777500 Weekend NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House 10472.0 1988 NEWBOLD AVENUE NEWBOLD AVENUE DEAD END EAST 177 STREET DEAD END EAST 177 STREET ADDRESS BRONX NEWBOLD AVENUE NaN Closed NaN The Police Department responded to the complai... 01/15/2023 05:47:47 AM 09 BRONX 2.037940e+09 BRONX 1023295.0 242559.0 ONLINE Unspecified BRONX NaN NaN NaN NaN NaN NaN NaN 40.832358 -73.858906 (40.83235751279932, -73.85890589915455)
9 56530817 2023-01-15 00:01:07 2023-01-15 01:01:26 1.005278 Weekend NYPD New York City Police Department Noise - Residential Banging/Pounding Residential Building/House 10028.0 522 EAST 83 STREET EAST 83 STREET YORK AVENUE EAST END AVENUE YORK AVENUE EAST END AVENUE ADDRESS NEW YORK EAST 83 STREET NaN Closed NaN The Police Department responded to the complai... 01/15/2023 01:01:33 AM 08 MANHATTAN 1.015790e+09 MANHATTAN 998699.0 221149.0 ONLINE Unspecified MANHATTAN NaN NaN NaN NaN NaN NaN NaN 40.773667 -73.947833 (40.773667438961155, -73.94783272698673)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
54459 56585700 2023-01-21 23:58:48 2023-01-22 01:30:36 1.530000 Weekend NYPD New York City Police Department Noise - Commercial Loud Music/Party Club/Bar/Restaurant 11237.0 112 HARRISON PLACE HARRISON PLACE KNICKERBOCKER AVENUE PORTER AVENUE KNICKERBOCKER AVENUE PORTER AVENUE ADDRESS BROOKLYN HARRISON PLACE NaN Closed NaN The Police Department responded to the complai... 01/22/2023 01:30:45 AM 01 BROOKLYN 3.030040e+09 BROOKLYN 1003876.0 196783.0 PHONE Unspecified BROOKLYN NaN NaN NaN NaN NaN NaN NaN 40.706779 -73.929213 (40.70677887148764, -73.92921256734658)
54461 56588661 2023-01-21 23:59:10 2023-01-22 02:50:54 2.862222 Weekend NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House 11222.0 12 MCGUINNESS BLVD SOUTH MCGUINNESS BLVD SOUTH DRIGGS AVENUE BROOME STREET DRIGGS AVENUE BROOME STREET ADDRESS BROOKLYN MCGUINNESS BLVD SOUTH NaN Closed NaN The Police Department responded to the complai... 01/22/2023 02:50:58 AM 01 BROOKLYN 3.027008e+09 BROOKLYN 998995.0 202512.0 ONLINE Unspecified BROOKLYN NaN NaN NaN NaN NaN NaN NaN 40.722513 -73.946805 (40.72251308480256, -73.94680489829233)
54462 56591186 2023-01-21 23:59:12 2023-01-22 06:41:47 6.709722 Weekend NYPD New York City Police Department Illegal Parking Posted Parking Sign Violation Street/Sidewalk 11205.0 93 SAINT EDWARDS STREET SAINT EDWARDS STREET MONUMENT WALK AUBURN PLACE MONUMENT WALK AUBURN PLACE ADDRESS BROOKLYN ST EDWARDS STREET NaN Closed NaN The Police Department responded to the complai... 01/22/2023 06:41:52 AM 02 BROOKLYN 3.020390e+09 BROOKLYN 990352.0 192303.0 PHONE Unspecified BROOKLYN NaN NaN NaN NaN NaN NaN NaN 40.694502 -73.977995 (40.694501981434954, -73.97799524430377)
54463 56587053 2023-01-21 23:59:24 2023-01-22 06:24:40 6.421111 Weekend NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House 11369.0 32-50 93 STREET 93 STREET 32 AVENUE NORTHERN BOULEVARD 32 AVENUE NORTHERN BOULEVARD ADDRESS EAST ELMHURST 93 STREET NaN Closed NaN The Police Department responded to the complai... 01/22/2023 06:24:56 AM 03 QUEENS 4.014220e+09 QUEENS 1018788.0 215331.0 ONLINE Unspecified QUEENS NaN NaN NaN NaN NaN NaN NaN 40.757643 -73.875333 (40.75764288488996, -73.87533252677208)
54464 56590382 2023-01-22 00:00:00 2023-01-22 04:29:17 4.488056 Weekend NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House 10461.0 3109 ROBERTS AVENUE ROBERTS AVENUE MAHAN AVENUE PARKVIEW AVENUE MAHAN AVENUE PARKVIEW AVENUE ADDRESS BRONX ROBERTS AVENUE NaN Closed NaN The Police Department responded to the complai... 01/22/2023 04:29:21 AM 10 BRONX 2.041770e+09 BRONX 1031760.0 247686.0 ONLINE Unspecified BRONX NaN NaN NaN NaN NaN NaN NaN 40.846388 -73.828280 (40.84638812959795, -73.82828029128923)

21534 rows × 43 columns

Visualize the distribution of uncensored duration by weekdays/weekend and by borough.

Lets check the Borough that is Unspecified.

Code
# Remove the unspecified Borough
A = nyc311_NYPD[ nyc311_NYPD['Borough'] == 'Unspecified']

A
Unique Key Created Date Closed Date Duration (hours) Day type Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Description Resolution Action Updated Date Community Board BBL Borough X Coordinate (State Plane) Y Coordinate (State Plane) Open Data Channel Type Park Facility Name Park Borough Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Latitude Longitude Location
20800 56548115 2023-01-17 17:30:38 2023-01-17 20:33:33 3.048611 Weekday NYPD New York City Police Department Abandoned Vehicle With License Plate Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/17/2023 08:33:37 PM 0 Unspecified NaN Unspecified NaN 263662.0 MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
20806 56551905 2023-01-17 17:32:13 2023-01-17 19:22:50 1.843611 Weekday NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/17/2023 07:22:54 PM 0 Unspecified NaN Unspecified 1023302.0 NaN ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
20809 56552968 2023-01-17 17:32:50 2023-01-17 19:05:54 1.551111 Weekday NYPD New York City Police Department Illegal Parking Blocked Crosswalk Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/17/2023 07:05:58 PM 0 Unspecified NaN Unspecified NaN 208263.0 MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
20817 56556125 2023-01-17 17:33:49 2023-01-17 19:25:22 1.859167 Weekday NYPD New York City Police Department Noise - Vehicle Engine Idling Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded and upon arriv... 01/17/2023 07:25:25 PM 0 Unspecified NaN Unspecified NaN NaN ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21833 56550662 2023-01-17 20:05:01 2023-01-17 20:37:07 0.535000 Weekday NYPD New York City Police Department Illegal Parking Posted Parking Sign Violation Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/17/2023 08:37:12 PM 0 Unspecified NaN Unspecified NaN 161437.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21834 56546660 2023-01-17 20:06:04 2023-01-17 21:08:39 1.043056 Weekday NYPD New York City Police Department Noise - Residential Loud Talking Residential Building/House NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/17/2023 09:08:44 PM 0 Unspecified NaN Unspecified NaN 244575.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21839 56553536 2023-01-17 20:06:47 2023-01-18 01:04:24 4.960278 Weekday NYPD New York City Police Department Illegal Parking Posted Parking Sign Violation Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department issued a summons in resp... 01/18/2023 01:04:30 AM 0 Unspecified NaN Unspecified NaN 192131.0 MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21840 56550574 2023-01-17 20:06:52 2023-01-17 20:45:29 0.643611 Weekday NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/17/2023 08:45:33 PM 0 Unspecified NaN Unspecified NaN 243863.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21859 56550616 2023-01-17 20:08:56 2023-01-18 01:00:28 4.858889 Weekday NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/18/2023 01:00:33 AM 0 Unspecified NaN Unspecified NaN 157201.0 MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21924 56549002 2023-01-17 20:20:18 2023-01-18 00:22:07 4.030278 Weekday NYPD New York City Police Department Illegal Parking Blocked Hydrant Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department issued a summons in resp... 01/18/2023 12:22:12 AM 0 Unspecified NaN Unspecified NaN 193837.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21946 56550413 2023-01-17 20:22:54 2023-01-17 23:20:07 2.953611 Weekday NYPD New York City Police Department Illegal Parking Commercial Overnight Parking Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded and upon arriv... 01/17/2023 11:20:13 PM 0 Unspecified NaN Unspecified 1001467.0 NaN MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21967 56551144 2023-01-17 20:26:42 2023-01-17 22:56:27 2.495833 Weekday NYPD New York City Police Department Illegal Parking Commercial Overnight Parking Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department issued a summons in resp... 01/17/2023 10:56:32 PM 0 Unspecified NaN Unspecified NaN 233230.0 MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21968 56550134 2023-01-17 20:26:53 2023-01-17 21:18:48 0.865278 Weekday NYPD New York City Police Department Illegal Parking Blocked Hydrant Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded and upon arriv... 01/17/2023 09:18:52 PM 0 Unspecified NaN Unspecified NaN 198674.0 MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21969 56558443 2023-01-17 20:27:05 2023-01-18 03:28:10 7.018056 Weekday NYPD New York City Police Department Blocked Driveway Partial Access Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/18/2023 03:28:17 AM 0 Unspecified NaN Unspecified NaN 196522.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21978 56550247 2023-01-17 20:28:15 2023-01-17 20:29:56 0.028056 Weekday NYPD New York City Police Department Illegal Parking Commercial Overnight Parking Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/17/2023 08:30:01 PM 0 Unspecified NaN Unspecified NaN 193454.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21987 56550597 2023-01-17 20:29:42 2023-01-17 21:36:58 1.121111 Weekday NYPD New York City Police Department Illegal Parking Posted Parking Sign Violation Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/17/2023 09:37:05 PM 0 Unspecified NaN Unspecified NaN 235106.0 PHONE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
22243 56548104 2023-01-17 21:16:09 2023-01-17 21:59:21 0.720000 Weekday NYPD New York City Police Department Noise - Residential Banging/Pounding Residential Building/House NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/17/2023 09:59:20 PM 0 Unspecified NaN Unspecified NaN 236679.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
22267 56551324 2023-01-17 21:20:19 2023-01-17 22:08:30 0.803056 Weekday NYPD New York City Police Department Drug Activity Use Outside Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/17/2023 10:08:41 PM 0 Unspecified NaN Unspecified NaN 231132.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
22299 56547307 2023-01-17 21:25:04 2023-01-17 22:07:03 0.699722 Weekday NYPD New York City Police Department Drug Activity Use Outside Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/17/2023 10:07:10 PM 0 Unspecified NaN Unspecified NaN 231186.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
22325 56549602 2023-01-17 21:29:14 2023-01-17 22:06:03 0.613611 Weekday NYPD New York City Police Department Non-Emergency Police Matter Other (complaint details) Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/17/2023 10:06:07 PM 0 Unspecified NaN Unspecified NaN 231199.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
22328 56549103 2023-01-17 21:30:23 2023-01-18 03:50:11 6.330000 Weekday NYPD New York City Police Department Abandoned Vehicle With License Plate Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/18/2023 03:50:20 AM 0 Unspecified NaN Unspecified NaN 262939.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
24354 56556951 2023-01-18 08:05:19 2023-01-18 10:35:36 2.504722 Weekday NYPD New York City Police Department Illegal Parking Parking Permit Improper Use Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded and upon arriv... 01/18/2023 10:35:44 AM 0 Unspecified NaN Unspecified NaN 200321.0 MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
24389 56565138 2023-01-18 08:09:30 2023-01-18 12:26:13 4.278611 Weekday NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/18/2023 12:26:19 PM 0 Unspecified NaN Unspecified NaN 181780.0 MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
24399 56560741 2023-01-18 08:10:48 2023-01-18 09:30:53 1.334722 Weekday NYPD New York City Police Department Abandoned Vehicle With License Plate Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/18/2023 09:30:33 AM 0 Unspecified NaN Unspecified NaN 221157.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
24411 56558667 2023-01-18 08:11:14 2023-01-18 10:00:22 1.818889 Weekday NYPD New York City Police Department Illegal Parking Blocked Bike Lane Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/18/2023 10:00:27 AM 0 Unspecified NaN Unspecified NaN 208822.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
24417 56558730 2023-01-18 08:12:18 2023-01-18 08:37:57 0.427500 Weekday NYPD New York City Police Department Illegal Parking Blocked Bike Lane Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded and upon arriv... 01/18/2023 08:38:02 AM 0 Unspecified NaN Unspecified NaN 216116.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
24469 56561431 2023-01-18 08:17:56 2023-01-18 10:35:37 2.294722 Weekday NYPD New York City Police Department Illegal Parking Parking Permit Improper Use Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded and upon arriv... 01/18/2023 10:35:43 AM 0 Unspecified NaN Unspecified NaN 200321.0 MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
24473 56558908 2023-01-18 08:18:14 2023-01-18 11:34:52 3.277222 Weekday NYPD New York City Police Department Illegal Parking Posted Parking Sign Violation Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/18/2023 11:34:56 AM 0 Unspecified NaN Unspecified NaN 214317.0 MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
24480 56564260 2023-01-18 08:19:05 2023-01-18 08:46:57 0.464444 Weekday NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN Your request can not be processed at this time... 01/18/2023 08:47:02 AM 0 Unspecified NaN Unspecified 998880.0 NaN ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
25869 56560175 2023-01-18 10:59:27 2023-01-18 16:11:06 5.194167 Weekday NYPD New York City Police Department Abandoned Vehicle With License Plate Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/18/2023 04:11:11 PM 0 Unspecified NaN Unspecified NaN NaN PHONE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
26259 56560760 2023-01-18 11:46:21 2023-01-18 13:37:17 1.848889 Weekday NYPD New York City Police Department Blocked Driveway Partial Access Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded and upon arriv... 01/18/2023 01:37:22 PM 0 Unspecified NaN Unspecified NaN 157474.0 ONLINE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
27250 56565190 2023-01-18 13:22:17 2023-01-18 21:31:12 8.148611 Weekday NYPD New York City Police Department Illegal Parking Unauthorized Bus Layover Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/18/2023 09:31:16 PM 0 Unspecified NaN Unspecified NaN 182459.0 MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
27273 56560432 2023-01-18 13:24:07 2023-01-18 22:17:26 8.888611 Weekday NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/18/2023 10:17:30 PM 0 Unspecified NaN Unspecified NaN 248717.0 MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
27297 56564601 2023-01-18 13:25:52 2023-01-18 13:59:59 0.568611 Weekday NYPD New York City Police Department Illegal Parking Blocked Crosswalk Street/Sidewalk NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Closed NaN The Police Department responded to the complai... 01/18/2023 02:00:04 PM 0 Unspecified NaN Unspecified NaN 249838.0 MOBILE Unspecified Unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

The remaining borough that is Unspecified does not have Incident Zip which merging with uszipcode does not help to get more info about the data.

Code
# Remove the unspecified Borough
nyc311_NYPD = nyc311_NYPD[ nyc311_NYPD['Borough'] != 'Unspecified']

# Remove the zipcode that is null due to missing latitude and longitude
nyc311_NYPD = nyc311_NYPD[nyc311_NYPD['Incident Zip'].notnull()]

from plotnine import *

(
    ggplot(nyc311_NYPD, aes(x='Borough',y = 'Duration (hours)', fill='Day type'))
    + geom_boxplot()
)

<ggplot: (8765089180880)>

Test whether the distributions are the same across weekdays/weekends.

Code
from scipy.stats import mannwhitneyu

# Create the subset of data
weekday_dur = nyc311_NYPD.loc[nyc311_NYPD['Day type'] == 'Weekday', 'Duration (hours)']

weekend_dur = nyc311_NYPD.loc[nyc311_NYPD['Day type'] == 'Weekend', 'Duration (hours)']

# Perform the weekday/weekend comparison
weekday_weekend_pvalue = mannwhitneyu(weekday_dur, weekend_dur).pvalue

print(weekday_weekend_pvalue)
0.017678279127635112

The null hypothesis is that there is no difference between the distribution of the durition between weekdays and weekends. We use Mann-Whitney U test to test this hypothesis. The p value is less than 0.05 so the null hypothesis is rejected, and thus the distribution is different accross weekdays and weekends.

Test whether the distributions are the same across borough.

Code
# Create the subset of the data
from scipy.stats import kruskal

group1 =nyc311_NYPD[nyc311_NYPD['Borough'] == 'BROOKLYN']['Duration (hours)']
group2 = nyc311_NYPD[nyc311_NYPD['Borough'] == 'QUEENS']['Duration (hours)']
group3 = nyc311_NYPD[nyc311_NYPD['Borough'] == 'MANHATTAN']['Duration (hours)']
group4 = nyc311_NYPD[nyc311_NYPD['Borough'] == 'BRONX']['Duration (hours)']
group5 = nyc311_NYPD[nyc311_NYPD['Borough'] == 'STATEN ISLAND']['Duration (hours)']

statistic, p_value = kruskal(group1, group2, group3, group4, group5)

print('Borough p-values:', p_value)
Borough p-values: 0.0

The null hypothesis is that there is no significant median difference in the distribution of the duration across the borough. We use Kruskal-Wallis test to test this hypothesis. The p value is less than 0.05 so the null hypothesis is rejected, suggesting that the medians are different across borough.

Lets merge with uszipcode data.

Code
import os
# set the default database file location
db_file = os.path.abspath("simple_db.sqlite")

import sqlite3
import pandas as pd
# change to your own path after installing uszipcode
con = sqlite3.connect(db_file)
zipdf = pd.read_sql_query("SELECT * from simple_zipcode", con)
zipdf.info()

# Change the type to float64
zipdf['zipcode'] = zipdf['zipcode'].astype('float64')

# Change the type to float64 too
nyc311_NYPD['Incident Zip'] = nyc311_NYPD['Incident Zip'].astype('float64')

merged_df = pd.merge(nyc311_NYPD, zipdf, how = 'left', left_on = 'Incident Zip', right_on = 'zipcode')

# Drop repeating data
merged_df = merged_df.drop(['zipcode','lat','lng'],axis = 1)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42724 entries, 0 to 42723
Data columns (total 24 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   zipcode                  42724 non-null  object 
 1   zipcode_type             42724 non-null  object 
 2   major_city               42724 non-null  object 
 3   post_office_city         33104 non-null  object 
 4   common_city_list         41877 non-null  object 
 5   county                   42724 non-null  object 
 6   state                    42724 non-null  object 
 7   lat                      42724 non-null  float64
 8   lng                      42724 non-null  float64
 9   timezone                 42724 non-null  object 
 10  radius_in_miles          33104 non-null  float64
 11  area_code_list           42724 non-null  object 
 12  population               31448 non-null  float64
 13  population_density       31448 non-null  float64
 14  land_area_in_sqmi        31448 non-null  float64
 15  water_area_in_sqmi       31448 non-null  float64
 16  housing_units            31448 non-null  float64
 17  occupied_housing_units   31448 non-null  float64
 18  median_home_value        31448 non-null  float64
 19  median_household_income  31448 non-null  float64
 20  bounds_west              33104 non-null  float64
 21  bounds_east              33104 non-null  float64
 22  bounds_north             33104 non-null  float64
 23  bounds_south             33104 non-null  float64
dtypes: float64(15), object(9)
memory usage: 7.8+ MB

Displaying merged data.

Code
merged_df
Unique Key Created Date Closed Date Duration (hours) Day type Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Description Resolution Action Updated Date Community Board BBL Borough X Coordinate (State Plane) Y Coordinate (State Plane) Open Data Channel Type Park Facility Name Park Borough Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Latitude Longitude Location zipcode_type major_city post_office_city common_city_list county state timezone radius_in_miles area_code_list population population_density land_area_in_sqmi water_area_in_sqmi housing_units occupied_housing_units median_home_value median_household_income bounds_west bounds_east bounds_north bounds_south
0 56524984 2023-01-15 00:00:18 2023-01-15 02:37:30 2.620000 Weekend NYPD New York City Police Department Noise - Residential Banging/Pounding Residential Building/House 10467.0 2504 BRONX PARK EAST BRONX PARK EAST MACE AVENUE ALLERTON AVENUE MACE AVENUE ALLERTON AVENUE ADDRESS BRONX BRONX PARK EAST NaN Closed NaN The Police Department responded to the complai... 01/15/2023 02:37:36 AM 11 BRONX 2.044270e+09 BRONX 1020095.0 253941.0 ONLINE Unspecified BRONX NaN NaN NaN NaN NaN NaN NaN 40.863611 -73.870409 (40.8636112787934, -73.87040864262848) STANDARD Bronx Bronx, NY b'x\x9c\x8bVr*\xca\xcf\xabP\x8a\x05\x00\x0e\xb... Bronx County NY America/New_York 2.000000 b'x\x9cS261\xd717\xb4P\x02\x00\x08f\x01\xaf' 97060.0 41649.0 2.33 0.00 37432.0 35524.0 369500.0 36048.0 -73.895710 -73.855691 40.908016 40.857266
1 56526790 2023-01-15 00:00:21 2023-01-15 01:54:24 1.900833 Weekend NYPD New York City Police Department Noise - Street/Sidewalk Loud Music/Party Street/Sidewalk 11418.0 113-22 BESSEMER STREET BESSEMER STREET 84 AVENUE 114 STREET 84 AVENUE 114 STREET ADDRESS RICHMOND HILL BESSEMER STREET NaN Closed NaN The Police Department responded to the complai... 01/15/2023 01:54:28 AM 09 QUEENS 4.092110e+09 QUEENS 1029284.0 195357.0 MOBILE Unspecified QUEENS NaN NaN NaN NaN NaN NaN NaN 40.702772 -73.837580 (40.70277200816068, -73.83758019372564) STANDARD Richmond Hill Richmond Hill, NY b'x\x9c\x8bV\n\xcaL\xce\xc8\xcd\xcfKQ\xf0\xc8\... Queens County NY America/New_York 1.000000 b'x\x9cS261\xd7\xb14\x04b#K\x1dsC\x0b%\x00\x1d... 36256.0 22201.0 1.63 0.00 11854.0 11031.0 427000.0 60691.0 -73.854747 -73.813659 40.711641 40.689442
2 56525034 2023-01-15 00:00:50 2023-01-15 01:09:56 1.151667 Weekend NYPD New York City Police Department Illegal Parking Posted Parking Sign Violation Street/Sidewalk 11217.0 50 NEVINS STREET NEVINS STREET SCHERMERHORN STREET STATE STREET SCHERMERHORN STREET STATE STREET ADDRESS BROOKLYN NEVINS STREET NaN Closed NaN The Police Department responded to the complai... 01/15/2023 01:10:00 AM 02 BROOKLYN 3.001728e+09 BROOKLYN 989264.0 189641.0 MOBILE Unspecified BROOKLYN NaN NaN NaN NaN NaN NaN NaN 40.687196 -73.981921 (40.68719608843765, -73.98192072171082) STANDARD Brooklyn Brooklyn, NY b'x\x9c\x8bVr*\xca\xcf\xcf\xce\xa9\xccS\x8a\x0... Kings County NY America/New_York 0.909091 b'x\x9cS231\xd317\xb4P\x02\x00\x08x\x01\xb1' 35881.0 47796.0 0.75 0.00 17581.0 16269.0 857000.0 81862.0 -73.990867 -73.970640 40.689901 40.672639
3 56526234 2023-01-15 00:01:02 2023-01-15 05:47:41 5.777500 Weekend NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House 10472.0 1988 NEWBOLD AVENUE NEWBOLD AVENUE DEAD END EAST 177 STREET DEAD END EAST 177 STREET ADDRESS BRONX NEWBOLD AVENUE NaN Closed NaN The Police Department responded to the complai... 01/15/2023 05:47:47 AM 09 BRONX 2.037940e+09 BRONX 1023295.0 242559.0 ONLINE Unspecified BRONX NaN NaN NaN NaN NaN NaN NaN 40.832358 -73.858906 (40.83235751279932, -73.85890589915455) STANDARD Bronx Bronx, NY b'x\x9c\x8bVr*\xca\xcf\xabP\x8a\x05\x00\x0e\xb... Bronx County NY America/New_York 1.000000 b'x\x9cS27\xb4P\x02\x00\x02\xb0\x00\xe5' 66358.0 63167.0 1.05 0.00 23386.0 22002.0 429600.0 30288.0 -73.884568 -73.845633 40.836137 40.823160
4 56530817 2023-01-15 00:01:07 2023-01-15 01:01:26 1.005278 Weekend NYPD New York City Police Department Noise - Residential Banging/Pounding Residential Building/House 10028.0 522 EAST 83 STREET EAST 83 STREET YORK AVENUE EAST END AVENUE YORK AVENUE EAST END AVENUE ADDRESS NEW YORK EAST 83 STREET NaN Closed NaN The Police Department responded to the complai... 01/15/2023 01:01:33 AM 08 MANHATTAN 1.015790e+09 MANHATTAN 998699.0 221149.0 ONLINE Unspecified MANHATTAN NaN NaN NaN NaN NaN NaN NaN 40.773667 -73.947833 (40.773667438961155, -73.94783272698673) STANDARD New York New York, NY b'x\x9c\x8bV\xf2K-W\x88\xcc/\xcaV\x8a\x05\x00\... New York County NY America/New_York 0.852273 b'x\x9cS224\xd2161\xd7131\xd3\xb144W\x02\x00\x... 45141.0 143683.0 0.31 0.00 28139.0 25158.0 1000001.0 104638.0 -73.963230 -73.944337 40.782213 40.770062
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
21483 56585700 2023-01-21 23:58:48 2023-01-22 01:30:36 1.530000 Weekend NYPD New York City Police Department Noise - Commercial Loud Music/Party Club/Bar/Restaurant 11237.0 112 HARRISON PLACE HARRISON PLACE KNICKERBOCKER AVENUE PORTER AVENUE KNICKERBOCKER AVENUE PORTER AVENUE ADDRESS BROOKLYN HARRISON PLACE NaN Closed NaN The Police Department responded to the complai... 01/22/2023 01:30:45 AM 01 BROOKLYN 3.030040e+09 BROOKLYN 1003876.0 196783.0 PHONE Unspecified BROOKLYN NaN NaN NaN NaN NaN NaN NaN 40.706779 -73.929213 (40.70677887148764, -73.92921256734658) STANDARD Brooklyn Brooklyn, NY b'x\x9c\x8bVr*\xca\xcf\xcf\xce\xa9\xccS\x8a\x0... Kings County NY America/New_York 1.000000 b'x\x9cS261\xd717\xb4P\x02\x00\x08f\x01\xaf' 49896.0 50927.0 0.98 0.02 16724.0 15512.0 464500.0 40372.0 -73.934802 -73.902128 40.716245 40.690409
21484 56588661 2023-01-21 23:59:10 2023-01-22 02:50:54 2.862222 Weekend NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House 11222.0 12 MCGUINNESS BLVD SOUTH MCGUINNESS BLVD SOUTH DRIGGS AVENUE BROOME STREET DRIGGS AVENUE BROOME STREET ADDRESS BROOKLYN MCGUINNESS BLVD SOUTH NaN Closed NaN The Police Department responded to the complai... 01/22/2023 02:50:58 AM 01 BROOKLYN 3.027008e+09 BROOKLYN 998995.0 202512.0 ONLINE Unspecified BROOKLYN NaN NaN NaN NaN NaN NaN NaN 40.722513 -73.946805 (40.72251308480256, -73.94680489829233) STANDARD Brooklyn Brooklyn, NY b'x\x9c\x8bVr*\xca\xcf\xcf\xce\xa9\xccS\x8a\x0... Kings County NY America/New_York 1.000000 b'x\x9cS261\xd717\xb4\xd0\xb144W\x02\x00\x11[\... 36934.0 24232.0 1.52 0.08 18756.0 17033.0 726500.0 63739.0 -73.962795 -73.928137 40.739446 40.718088
21485 56591186 2023-01-21 23:59:12 2023-01-22 06:41:47 6.709722 Weekend NYPD New York City Police Department Illegal Parking Posted Parking Sign Violation Street/Sidewalk 11205.0 93 SAINT EDWARDS STREET SAINT EDWARDS STREET MONUMENT WALK AUBURN PLACE MONUMENT WALK AUBURN PLACE ADDRESS BROOKLYN ST EDWARDS STREET NaN Closed NaN The Police Department responded to the complai... 01/22/2023 06:41:52 AM 02 BROOKLYN 3.020390e+09 BROOKLYN 990352.0 192303.0 PHONE Unspecified BROOKLYN NaN NaN NaN NaN NaN NaN NaN 40.694502 -73.977995 (40.694501981434954, -73.97799524430377) STANDARD Brooklyn Brooklyn, NY b'x\x9c\x8bVr*\xca\xcf\xcf\xce\xa9\xccS\x8a\x0... Kings County NY America/New_York 1.000000 b'x\x9cS224R\x02\x00\x02\x90\x00\xda' 40366.0 42508.0 0.95 0.00 16409.0 14690.0 636900.0 44688.0 -73.980729 -73.951444 40.705954 40.687415
21486 56587053 2023-01-21 23:59:24 2023-01-22 06:24:40 6.421111 Weekend NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House 11369.0 32-50 93 STREET 93 STREET 32 AVENUE NORTHERN BOULEVARD 32 AVENUE NORTHERN BOULEVARD ADDRESS EAST ELMHURST 93 STREET NaN Closed NaN The Police Department responded to the complai... 01/22/2023 06:24:56 AM 03 QUEENS 4.014220e+09 QUEENS 1018788.0 215331.0 ONLINE Unspecified QUEENS NaN NaN NaN NaN NaN NaN NaN 40.757643 -73.875333 (40.75764288488996, -73.87533252677208) STANDARD East Elmhurst East Elmhurst, NY b'x\x9c\x8bVrM,.Qp\xcd\xc9\xcd(-*.Q\xd2QPr\xcb... Queens County NY America/New_York 1.000000 b'x\x9cS224R\x02\x00\x02\x90\x00\xda' 38615.0 36170.0 1.07 0.00 12184.0 11194.0 464500.0 53617.0 -73.887671 -73.854413 40.771808 40.755854
21487 56590382 2023-01-22 00:00:00 2023-01-22 04:29:17 4.488056 Weekend NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House 10461.0 3109 ROBERTS AVENUE ROBERTS AVENUE MAHAN AVENUE PARKVIEW AVENUE MAHAN AVENUE PARKVIEW AVENUE ADDRESS BRONX ROBERTS AVENUE NaN Closed NaN The Police Department responded to the complai... 01/22/2023 04:29:21 AM 10 BRONX 2.041770e+09 BRONX 1031760.0 247686.0 ONLINE Unspecified BRONX NaN NaN NaN NaN NaN NaN NaN 40.846388 -73.828280 (40.84638812959795, -73.82828029128923) STANDARD Bronx Bronx, NY b'x\x9c\x8bVr*\xca\xcf\xabP\x8a\x05\x00\x0e\xb... Bronx County NY America/New_York 1.000000 b'x\x9cS261\xd717\xb4\xd0\xb144\xd7\xb14\xb2T\... 50502.0 21045.0 2.40 0.00 21127.0 19772.0 454700.0 52347.0 -73.861920 -73.824957 40.861021 40.830599

21488 rows × 64 columns

Build models to predict over3h

Data preparation

Code
import pandas as pd
import numpy as np

# load data 
train = pd.read_csv("nyc311_NYPD_merged.csv")
test = pd.read_csv("nyc311_NYPD_test_merged.csv")

# Create new variable over3h.
train['over3h'] = train['Duration (hours)'].\
    apply(lambda x: 1 if x >= 3 else 0)
test['over3h'] = test['Duration (hours)'].\
    apply(lambda x: 1 if x >= 3 else 0)

Construct an hour variable with integer values from 0 to 23.

Code
train['hour'] = pd.to_datetime(train['Created Date']).dt.hour
test['hour'] = pd.to_datetime(test['Created Date']).dt.hour

Construct a house_income_ratio represents the ratio of median_home_value to median_household_income.

Code
train['house_income_ratio'] = train['median_home_value'] / train['median_household_income']
test['house_income_ratio'] = test['median_home_value'] / train['median_household_income']

Drop cases with missing values for certain columns.

Code
train = train.dropna(subset=['hour', 'Day type', 'Complaint Type', 'Community Board',\
                 'Resolution Description', 'Open Data Channel Type','population_density',\
                 'median_home_value', 'house_income_ratio'])
test = test.dropna(subset=['hour', 'Day type', 'Complaint Type', 'Community Board',\
                 'Resolution Description', 'Open Data Channel Type','population_density',\
                 'median_home_value', 'house_income_ratio'])

When using the fitted model to predict the test data, we noticed that there are certain features that appear only in the testing data (3 cases) and certain features that appear only in the training data (6 cases). Therefore, we need to revisit this step and drop these cases before proceeding with the fit and prediction.

Drop cases with certain values for certain columns

Code
test = test.drop(test[test['Community Board'] == '28 BRONX'].index)
test = test.drop(test[test['Community Board'] == '80 QUEENS'].index)
test = test.drop(test[test['Complaint Type'] == 'Squeegee'].index)
train = train.drop(train[train['Community Board'] == '27 BRONX'].index)
train = train.drop(train[train['Community Board'] == '81 QUEENS'].index)
train = train.drop(train[train['Resolution Description'] == \
'Your complaint has been received by the Police Department and additional information will be available later.'].index)
train = train.drop(train[train['Open Data Channel Type'] == 'UNKNOWN'].index)

Get the training data

Code
# Separate the categorical and continuous variables into separate dataframes.
train_categorical_variables = train[['hour', 'Day type', 'Complaint Type', 'Community Board',\
                 'Resolution Description', 'Open Data Channel Type']]
train_continuous_variables = train[['population_density', 'median_home_value', 'house_income_ratio']]

# Perform one-hot encoding on the categorical variables
train_categorical_encoded = pd.get_dummies(train_categorical_variables)
# Combine the one-hot encoded categorical variables with the continuous variables
X_train = pd.concat([train_categorical_encoded, train_continuous_variables], axis=1)
print(X_train.shape) 
(21354, 114)

Get the testing data

Code
# separate the categorical and continuous variables into separate dataframes
test_categorical_variables = test[['hour', 'Day type', 'Complaint Type', 'Community Board',\
                 'Resolution Description', 'Open Data Channel Type']]
test_continuous_variables = test[['population_density', 'median_home_value', 'house_income_ratio']]

# Perform one-hot encoding on the categorical variables
test_categorical_encoded = pd.get_dummies(test_categorical_variables)

# Combine the one-hot encoded categorical variables with the continuous variables
X_test = pd.concat([test_categorical_encoded, test_continuous_variables], axis=1)

print(X_test.shape)
(19253, 114)
Code
# Get y for both the training and test data
y_train = train['over3h'].values
y_test = test['over3h'].values

Import and fit the models

Code
from sklearn.svm import SVC

# Fit SVM
svm = SVC()
svm.fit(X_train, y_train)
SVC()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Code
from sklearn import tree

# Fit decision tree
tree1 = tree.DecisionTreeClassifier()
tree1.fit(X_train, y_train)
DecisionTreeClassifier()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Code
tree1.tree_.node_count
7325

Evaluation of the models

Code
# caculate the predicted values
svm_pred = svm.predict(X_test)
tree1_pred = tree1.predict(X_test)
Code
# evaluate the model with default parameters
from sklearn.metrics import confusion_matrix, \
accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

# Confusion matrix
svm_cm = confusion_matrix(y_test, svm_pred)
tree1_cm = confusion_matrix(y_test, tree1_pred)

# Accuracy
svm_acc = accuracy_score(y_test, svm_pred)
tree1_acc = accuracy_score(y_test, tree1_pred)

# Precision
svm_precision = precision_score(y_test, svm_pred, zero_division=1)
tree1_precision = precision_score(y_test, tree1_pred)

# Recall
svm_recall = recall_score(y_test, svm_pred)
tree1_recall = recall_score(y_test, tree1_pred)

# F1-score
svm_f1 = f1_score(y_test, svm_pred)
tree1_f1 = f1_score(y_test, tree1_pred)

# AUC
svm_auc = roc_auc_score(y_test, svm_pred)
tree1_auc = roc_auc_score(y_test, tree1_pred)
Code
print("SVM results:")
print("Confusion matrix:")
print(svm_cm)
print("Accuracy:", svm_acc)
print("Precision:", svm_precision)
print("Recall:", svm_recall)
print("F1-score:", svm_f1)
print("AUC:", svm_auc)

print("decision tree results:")
print("Confusion matrix:")
print(tree1_cm)
print("Accuracy:", tree1_acc)
print("Precision:", tree1_precision)
print("Recall:", tree1_recall)
print("F1-score:", tree1_f1)
print("AUC:", tree1_auc)
SVM results:
Confusion matrix:
[[15471     0]
 [ 3782     0]]
Accuracy: 0.8035630810782736
Precision: 1.0
Recall: 0.0
F1-score: 0.0
AUC: 0.5
decision tree results:
Confusion matrix:
[[13641  1830]
 [ 2298  1484]]
Accuracy: 0.7855918558146783
Precision: 0.44779722389861193
Recall: 0.39238498149127443
F1-score: 0.41826381059751966
AUC: 0.6370495781995832

Here are the syntax we tried to optimize svm parameters

from sklearn.preprocessing import StandardScaler

# scale the continuous variables 

scaler = StandardScaler()
X_train_cont_scaled = scaler.fit_transform(train_continuous_variables)
X_test_cont_scaled = scaler.transform(test_continuous_variables)
X_train_cont_scaled = pd.DataFrame(X_train_cont_scaled, columns=['scaled1', 'scaled2', 'scaled3'])
X_test_cont_scaled = pd.DataFrame(X_test_cont_scaled, columns=['scaled1', 'scaled2', 'scaled3'])
train_categorical_encoded = train_categorical_encoded.reset_index()
test_categorical_encoded = test_categorical_encoded.reset_index()
X_train_s = pd.concat([train_categorical_encoded, X_train_cont_scaled], axis=1)
X_test_s = pd.concat([test_categorical_encoded, X_test_cont_scaled], axis=1)

# Fit SVM with scaled variables
svm2 = SVC()
svm2.fit(X_train_s, y_train)

# define the hyperparameter space to search over for svm
from sklearn.model_selection import GridSearchCV
param_grid = {'C': [0.001, 0.01, 0.1],
              'gamma': [0.01, 0.1, 1, 'scale', 'auto'],
              'kernel': ['linear', 'rbf', 'sigmoid']}

# perform cross-validation with GridSearchCV
grid_search = GridSearchCV(svm2, param_grid, cv=5, scoring='f1')

# fit the GridSearchCV object to the training data
grid_search.fit(X_train_s, y_train)

# print the best hyperparameters found
print("Best hyperparameters: ", grid_search.best_params_)

We observed that the SVM model performed poorly with the default parameters. To improve its performance, we decided to scale the continuous variables and use cross-validation to find the optimal hyperparameters. However, the SVM model took too long to train and did not converge even after running for half an hour.

In contrast, the decision tree model provided accurate predictions with the default parameters and was much more efficient than the SVM model. Therefore, we decided to use the decision tree model for further optimazition.

Decision tree parameter tuning

Code
# define the hyperparameter grid 

from sklearn.model_selection import GridSearchCV

param_grid = {'criterion': ['gini', 'entropy'],
              'min_impurity_decrease': [0, 1e-5, 1e-4, 1e-3],
              'ccp_alpha': [0.0, 1e-5, 1e-4, 1e-3, 0.01, 0.1]}

# perform cross-validation with GridSearchCV
grid_search = GridSearchCV(tree1, param_grid, cv=5, scoring='roc_auc')

# fit the GridSearchCV object to the training data
grid_search.fit(X_train, y_train)

# print the best hyperparameters found
grid_search.best_params_
{'ccp_alpha': 0.001, 'criterion': 'entropy', 'min_impurity_decrease': 0}
Code
# Use parameters from cross-validation to train another model
tree2 = tree.DecisionTreeClassifier(criterion='entropy', ccp_alpha=0.001, min_impurity_decrease=0)
tree2 = tree2.fit(X_train, y_train)
tree2.tree_.node_count
69
Code
# caculate the predicted values
tree2_pred = tree2.predict(X_test)

# evaluate the model 

# Confusion matrix
tree2_cm = confusion_matrix(y_test, tree2_pred)

# Accuracy
tree2_acc = accuracy_score(y_test, tree2_pred)

# Precision
tree2_precision = precision_score(y_test, tree2_pred)

# Recall
tree2_recall = recall_score(y_test, tree2_pred)

# F1-score
tree2_f1 = f1_score(y_test, tree2_pred)

# AUC
tree2_auc = roc_auc_score(y_test, tree2_pred)
Code
print("decision tree2 results:")
print("Confusion matrix:")
print(tree2_cm)
print("Accuracy:", tree2_acc)
print("Precision:", tree2_precision)
print("Recall:", tree2_recall)
print("F1-score:", tree2_f1)
print("AUC:", tree2_auc)
decision tree2 results:
Confusion matrix:
[[15131   340]
 [ 3012   770]]
Accuracy: 0.8258972627642446
Precision: 0.6936936936936937
Recall: 0.20359598096245374
F1-score: 0.3147996729354048
AUC: 0.5908096897896102

Summary for model selection

Cross-validation was used to optimize the tuning parameters for building the second decision tree model with the objective of minimizing overfitting and maximizing AUC. While the second model (tree2) performed better on accuracy and precision, it did not perform as well as the first model (tree1) on other metrics such as recall, F1 score, and AUC. However, tree2 had significantly fewer nodes than tree1. Overall, both models performed adequately and the choice of model would depend on the specific metric that is more important for the task at hand. For instance, if higher precision is desired, then tree1 would be the preferred choice.

Research question of interest and exploration

Our research question is what suggestions we can give for individuals (friends or clients) seeking to purchase a neighbor-friendly house in NYC. Our analysis involved exploring factors such as total complains, noise complaints, parking violations, NYPD location, and median house value.

Code
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
from plotnine import *
from plotnine.data import *
import numpy as np
%matplotlib inline
from shapely.geometry import Point 
Code
# Load data and select some columns
nyc_merged = pd.read_csv('nyc311_NYPD_merged.csv')
comp_df = nyc_merged[['Complaint Type', 'Incident Zip', 'Borough', 'Latitude', 'Longitude', 'median_home_value']].copy()
comp_df['comp_type'] = comp_df['Complaint Type'].astype('category').cat.codes
comp_df['zipstr'] = comp_df['Incident Zip'].apply(lambda f: str(int(f)))
comp_df
Complaint Type Incident Zip Borough Latitude Longitude median_home_value comp_type zipstr
0 Noise - Residential 10467.0 BRONX 40.863611 -73.870409 369500.0 14 10467
1 Noise - Street/Sidewalk 11418.0 QUEENS 40.702772 -73.837580 427000.0 15 11418
2 Illegal Parking 11217.0 BROOKLYN 40.687196 -73.981921 857000.0 10 11217
3 Noise - Residential 10472.0 BRONX 40.832358 -73.858906 429600.0 14 10472
4 Noise - Residential 10028.0 MANHATTAN 40.773667 -73.947833 1000001.0 14 10028
... ... ... ... ... ... ... ... ...
21483 Noise - Commercial 11237.0 BROOKLYN 40.706779 -73.929213 464500.0 11 11237
21484 Noise - Residential 11222.0 BROOKLYN 40.722513 -73.946805 726500.0 14 11222
21485 Illegal Parking 11205.0 BROOKLYN 40.694502 -73.977995 636900.0 10 11205
21486 Noise - Residential 11369.0 QUEENS 40.757643 -73.875333 464500.0 14 11369
21487 Noise - Residential 10461.0 BRONX 40.846388 -73.828280 454700.0 14 10461

21488 rows × 8 columns

Get an idea of complaint distribution - Plot complaint locations

To get a basic idea of complaint distribution, we can plot the locations of complaints reported in different boroughs. This will help us identify areas with a higher volume of complaints and any patterns or trends that may be specific to certain boroughs. We can do this by creating a map with different boroughs shown in different colors and using smaller markers to represent individual complaints. From the following map, we can see that Staten Island has the least complaint density.

Code
crs = 'epsg:4326' # coordinate reference system
comp_geometry = gpd.points_from_xy(comp_df['Longitude'], comp_df['Latitude'])
comp_gdf = gpd.GeoDataFrame(comp_df, crs=crs, geometry=comp_geometry)
comp_gdf.plot(column = 'Borough',markersize=0.5, legend=True)
<AxesSubplot: >

While Staten Island may have a much lower complaint density than other boroughs, it may have certain disadvantages such as limited transportation options. For example, commuting from Staten Island to Manhattan may be time-consuming and inconvenient for those who need to work in Manhattan. Therefore, we continue to explore the complaint density in different zip codes.

Code
# Load new york borough boundary data
nybb_gdf = gpd.read_file(gpd.datasets.get_path('nybb'))
Code
# zipcodes = gpd.read_file('data/tl_2022_us_zcta520/tl_2022_us_zcta520.shp')
zipcodes = gpd.read_file('nyc_zipcode.geojson')
# Count total complaints per zip code
total_comp_df = comp_df.groupby('zipstr')['comp_type'].count().reset_index(name='total complaints')
total_comp_df
# merge it with zip code area and plot it on the map
merge_total_comps = zipcodes.merge(total_comp_df, left_on='GEOID20', right_on='zipstr')
comp_map = merge_total_comps.explore(column='total complaints', cmap='OrRd', legend=True)

comp_map = nybb_gdf.explore(m=comp_map, color='black', style_kwds={'fill': False})
comp_map
Make this Notebook Trusted to load map: File -> Trust Notebook

Based on the interactive map above (displaying complaint density for different zip codes), it would be advisable to avoid certain areas in Brooklyn and the Bronx.

Explore certain complaint type per zip code

While we have explored some general neighbor-friendly areas by analyzing the total number of complaints, it’s important to consider specific needs of individuals. For instance, if your friends/clients are sensitive to noise and prefer a quiet environment, or if they drive and dislike areas with high incidents of illegal parking, we need to take those specific needs into account when suggesting neighborhoods.

Illegal parking

Code
# Get illegal parking data
illpark_df = comp_zipcode_df[comp_zipcode_df['Complaint Type'] == 'Illegal Parking'].copy()
# merge it with zip code area and plot it on the map
merge_illpark = zipcodes.merge(illpark_df, left_on='GEOID20', right_on='zipstr')
illpark_map = merge_illpark.explore(column='count', cmap='OrRd', legend=True)
illpark_map = nybb_gdf.explore(m=illpark_map, color='black', style_kwds={'fill': False})
illpark_map
Make this Notebook Trusted to load map: File -> Trust Notebook

This map shows that Staten Island and most areas in Queens (avoide the western area) are also good choices for people who need to drive and dislike illegal parking. In addition, Manhattan has relatively low numbers of illegal parking complaints.

Exploring factors that might impact certain types of complaints

Now that we have found some larger areas that we might want to suggest based on different needs, such as Staten Island, the Upper East Sideand, or most areas in Queens. You may still want more specific suggestions for a good place to live. For example, you may be wondering if it is a good idea to live near NYPD as this could potentially reduce noise and illegal activities in the area. Or, you may be curious if higher house prices correlate with having more friendly neighbors (do you need to pay more for a more friendly neighborhood). Let’s take a look.

Live near NYPD?

Load NYPD Precincts data Source: precinct locations are looked up from here Longitudes and latitudes are queried through Google Maps

Use this data to explore if 311 calls have any relationship with NYPD precinct locations

Code
# Add NYPD precinct locations
crs = {'init': 'epsg:4326'} # coordinate reference system
nypd_location = pd.read_csv('nypd_precinct_locations.csv')
nypd_geometry = gpd.points_from_xy(nypd_location['Longitude'], nypd_location['Latitude'])
nypd_gdf = gpd.GeoDataFrame(nypd_location, crs=crs, geometry=nypd_geometry)

nypd_gdf.explore(m = comp_map, marker_type='circle_marker', marker_kwds={'radius': 1}, legend=True)
/home/myc/applications/miniconda3/lib/python3.10/site-packages/pyproj/crs/crs.py:141: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
Make this Notebook Trusted to load map: File -> Trust Notebook

From the eyeball, it is not easy to tell if there is any relationship between the complaint counts and the distance to NYPD. To investigate whether living near NYPD impacts the number of complaints, I created a histogram of the distance between complaint locations and the nearest police station as x axis and the number of complaints as y axis.

Code
# To calculate distance, need to use a coordinate reference system that preserves distance. Here we use UTM.
# reference: https://gis.ny.gov/coordinationprogram/workgroups/wg_1/related/standards/datum.htm
nyc_utm = '+proj=utm +zone=18 +north +ellps=WGS84 +datum=WGS84 +units=m +no_defs'

comp_gdf2 = comp_gdf.to_crs(crs=nyc_utm) # Convert total compliants' GeoDataFrame
nypd_gdf2 = nypd_gdf.to_crs(crs=nyc_utm) # Covert NYPD precinct GeoDataFrame

# Calculate the distance to nearest precinct for each complaint
nearest_res = nypd_gdf2.sindex.nearest(comp_gdf2.geometry, return_all=False, return_distance=True)
dist_to_pd = nearest_res[1] # second element is the distance array

# View the data in a histogram
fig, ax = plt.subplots()
ax.hist(dist_to_pd, bins=100)
ax.set_xlabel('Distance to precinct (m)')
ax.set_ylabel('Number of complaints')
Text(0, 0.5, 'Number of complaints')

From this histogram, our suggestion is: don’t specifically look for houses or apartments that near NYPD.

Buy a more expensive house/apartment?

Code
# Get mean median home value for each zip code and drop NA values
median_homeval_df = comp_df.groupby('zipstr')['median_home_value'].mean().reset_index(name='median home value').dropna()

# merge it with zip code area and plot it on the map
merge_total_comps = zipcodes.merge(median_homeval_df, left_on='GEOID20', right_on='zipstr')
homeval_map = merge_total_comps.explore(column='median home value', cmap='OrRd', legend=True)
homeval_map
Make this Notebook Trusted to load map: File -> Trust Notebook

From the map, we can see that Manhattan and Western Brooklyn have the highest median house value. However, from our earlier exploration, we found that these areas also have high total complaints and high noise complaints. So, it’s not necessarily true that higher house prices correlate with more friendly neighbors. Admittedly, the earlier suggestion was just a general one based on boroughs. However, it is possible that house prices within a borough may moderate the relationship between complaints and house values. To explore this possibility, we can look at scatter plots of complaint counts against median house values for each borough. This will allow us to see if there are any notable trends or patterns within each borough.

Code
### Plot total complaints against mean home value
comp_homeval_df = median_homeval_df.merge(total_comp_df, left_on='zipstr', right_on='zipstr').copy()
comp_homeval_df = comp_homeval_df.merge(comp_df[['zipstr', 'Borough']], on='zipstr')
comp_homeval_df
zipstr median home value total complaints Borough
0 10001 650200.0 74 MANHATTAN
1 10001 650200.0 74 MANHATTAN
2 10001 650200.0 74 MANHATTAN
3 10001 650200.0 74 MANHATTAN
4 10001 650200.0 74 MANHATTAN
... ... ... ... ...
21355 11694 623200.0 59 QUEENS
21356 11694 623200.0 59 QUEENS
21357 11694 623200.0 59 QUEENS
21358 11694 623200.0 59 QUEENS
21359 11697 525600.0 1 QUEENS

21360 rows × 4 columns

Code
(
    ggplot(comp_homeval_df, # The dataset we are using
        aes(x = 'median home value', y='total complaints', fill = 'Borough')) 
        + geom_point(size=3, stroke=0)
)

<ggplot: (8765059332145)>

Based on the scatter plots for each borough, there doesn’t seem to be a clear correlation between the number of complaints and house prices. This suggests that higher house prices may not necessarily lead to more friendly neighbors.

Conclusion

In conclusion, analyzing complaint data can provide valuable insights for people looking to find a good place to live. By plotting complaint locations on a map, we can identify areas with higher complaint volumes and specific patterns or trends that may be unique to certain boroughs or neighborhoods. From the maps and analyses presented, we have identified some general areas that might be friendlier for different needs, such as Staten Island for those who are sensitive to noise or need to drive, or the Upper East Side for those who want a quieter location in Manhattan. However, when it comes to specific questions such as whether living near NYPD or paying higher house prices correlates with more friendly neighbors, the data does not provide clear evidence of any such correlation.